{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro to Pandas and Visualization\n", "\n", "In this example, we will explore the `DataFrame` type which is available in the `pandas` library, and some basics of how to use it. Then we will show how to do basic plots and charts using these `DataFrame`s. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Intro to Pandas `DataFrame`\n", "\n", "A `DataFrame` is basically a table that we can use as just another type of object in Python. The simplest way to construct a `DataFrame` (without loading from an external file) is to first create an empty `DataFrame`, then add some columns to that `DataFrame`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating `DataFrame`s\n", "\n", "As the `DataFrame` is in the `pandas` library, we will have to import that. The convention is to import it as `pd`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can go on to creating our first `DataFrame`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create First, Assign Columns Later" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: []" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame()\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we have seen with other custom types, we put the name of the class and then parenthes to construct it. While the `DataFrame` is empty, it doesn't look like much. Let's add some columns." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name
0Joe
1Jim
2Mary
\n", "
" ], "text/plain": [ " Name\n", "0 Joe\n", "1 Jim\n", "2 Mary" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Name'] = ['Joe', 'Jim', 'Mary']\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see we have now added a column `Name` to the `DataFrame`, with the values Joe, Jim, and Mary. Let's add three more columns." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeightReservation PricePercentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
\n", "
" ], "text/plain": [ " Name Weight Reservation Price Percentage Active\n", "0 Joe 150 10.12 0.6\n", "1 Jim 200 15.17 0.4\n", "2 Mary 130 13.25 0.7" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Weight'] = [150, 200, 130]\n", "df['Reservation Price'] = [10.12, 15.17, 13.25]\n", "df['Percentage Active'] = [0.6, 0.4, 0.7]\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have each row of the table representing a person, with the columns representing their name, weight, and price they would be willing to pay for some good. \n", "\n", "#### Create and Assign Columns at Same Time\n", "\n", "The `DataFrame` could have been constructed equivalently by passing a list of tuples to the constructor, where each tuple represents one row in the `DataFrame`. Then the column names are passed as a separate list:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeightReservation PricePercentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
\n", "
" ], "text/plain": [ " Name Weight Reservation Price Percentage Active\n", "0 Joe 150 10.12 0.6\n", "1 Jim 200 15.17 0.4\n", "2 Mary 130 13.25 0.7" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(\n", " [\n", " ('Joe', 150, 10.12, 0.6),\n", " ('Jim', 200, 15.17, 0.4),\n", " ('Mary', 130, 13.25, 0.7)\n", " ],\n", " columns=['Name', 'Weight', 'Reservation Price', 'Percentage Active']\n", ")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting From `DataFrame`s\n", "\n", "Sort of like a dictionary, put the name of the column in brackets to access it:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 10.12\n", "1 15.17\n", "2 13.25\n", "Name: Reservation Price, dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Reservation Price']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One column of a `DataFrame` is called a `Series`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(df['Reservation Price'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Select multiple columns by passing a list:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Reservation PriceWeight
010.12150
115.17200
213.25130
\n", "
" ], "text/plain": [ " Reservation Price Weight\n", "0 10.12 150\n", "1 15.17 200\n", "2 13.25 130" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['Reservation Price', 'Weight']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Access individual rows by `.iloc` and the zero-based index of the row:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name Joe\n", "Weight 150\n", "Reservation Price 10.12\n", "Percentage Active 0.6\n", "Name: 0, dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Access from both columns and rows at the same time using `.loc`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10.12" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[0, 'Reservation Price']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also query the `DataFrame`:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeightReservation PricePercentage Active
0Joe15010.120.6
2Mary13013.250.7
\n", "
" ], "text/plain": [ " Name Weight Reservation Price Percentage Active\n", "0 Joe 150 10.12 0.6\n", "2 Mary 130 13.25 0.7" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['Reservation Price'] < 14]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read the above as \"The `DataFrame` where the `DataFrame`'s reservation price is less than 14\"." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeightReservation PricePercentage Active
2Mary13013.250.7
\n", "
" ], "text/plain": [ " Name Weight Reservation Price Percentage Active\n", "2 Mary 130 13.25 0.7" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['Reservation Price'] < 14) & (df['Percentage Active'] > 0.6)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read the above as \"The `DataFrame` where the `DataFrame`'s reservation price is less than 14 and where the `DataFrame`'s percentage active is greater than 60%\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use `.loc` in combination with this query syntax to grab specific columns:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeight
2Mary130
\n", "
" ], "text/plain": [ " Name Weight\n", "2 Mary 130" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[\n", " (df['Reservation Price'] < 14) & (df['Percentage Active'] > 0.6),\n", " ['Name', 'Weight']\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Math with `DataFrame`s\n", "\n", "Basic math can be done with `DataFrame` columns." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 20.12\n", "1 25.17\n", "2 23.25\n", "Name: Reservation Price, dtype: float64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Reservation Price'] + 10" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 6.072\n", "1 6.068\n", "2 9.275\n", "dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Reservation Price'] * df['Percentage Active']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also, if the entire `DataFrame` is numbers, you can do math with that as well:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Reservation PriceWeight
0101.21500
1151.72000
2132.51300
\n", "
" ], "text/plain": [ " Reservation Price Weight\n", "0 101.2 1500\n", "1 151.7 2000\n", "2 132.5 1300" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['Reservation Price', 'Weight']] * 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `DataFrame` Summary Statistics\n", "\n", "There are some nice methods built in to `DataFrames` for summary info:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`.describe` gives you all the summary statistics:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WeightReservation PricePercentage Active
count3.0000003.0000003.000000
mean160.00000012.8466670.566667
std36.0555132.5490460.152753
min130.00000010.1200000.400000
25%140.00000011.6850000.500000
50%150.00000013.2500000.600000
75%175.00000014.2100000.650000
max200.00000015.1700000.700000
\n", "
" ], "text/plain": [ " Weight Reservation Price Percentage Active\n", "count 3.000000 3.000000 3.000000\n", "mean 160.000000 12.846667 0.566667\n", "std 36.055513 2.549046 0.152753\n", "min 130.000000 10.120000 0.400000\n", "25% 140.000000 11.685000 0.500000\n", "50% 150.000000 13.250000 0.600000\n", "75% 175.000000 14.210000 0.650000\n", "max 200.000000 15.170000 0.700000" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Average with `.mean`:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Weight 160.000000\n", "Reservation Price 12.846667\n", "Percentage Active 0.566667\n", "dtype: float64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Standard deviations:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Weight 36.055513\n", "Reservation Price 2.549046\n", "Percentage Active 0.152753\n", "dtype: float64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Median: (other percentiles available)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Weight 150.00\n", "Reservation Price 13.25\n", "Percentage Active 0.60\n", "Name: 0.5, dtype: float64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.quantile(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Minimum:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name Jim\n", "Weight 130\n", "Reservation Price 10.12\n", "Percentage Active 0.4\n", "dtype: object" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Maximum:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Name Mary\n", "Weight 200\n", "Reservation Price 15.17\n", "Percentage Active 0.7\n", "dtype: object" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Any of these can be applied on a row instead of a column:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 53.573333\n", "1 71.856667\n", "2 47.983333\n", "dtype: float64" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply Any Function to a `DataFrame`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can apply any arbitrary function to each of the values in the `DataFrame` using `.applymap`." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeightReservation PricePercentage Active
0JoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJ...150001012.060.0
1JimJimJimJimJimJimJimJimJimJimJimJimJimJimJimJ...200001517.040.0
2MaryMaryMaryMaryMaryMaryMaryMaryMaryMaryMaryMa...130001325.070.0
\n", "
" ], "text/plain": [ " Name Weight \\\n", "0 JoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJoeJ... 15000 \n", "1 JimJimJimJimJimJimJimJimJimJimJimJimJimJimJimJ... 20000 \n", "2 MaryMaryMaryMaryMaryMaryMaryMaryMaryMaryMaryMa... 13000 \n", "\n", " Reservation Price Percentage Active \n", "0 1012.0 60.0 \n", "1 1517.0 40.0 \n", "2 1325.0 70.0 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def multiply_by_100(value):\n", " return value * 100\n", "\n", "df.applymap(multiply_by_100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though be careful the function works with all your data types, you can see this one accidentally repeated the names 100x!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `DataFrame` Styling\n", "\n", "It is possible to style `DataFrames` with static and conditional formatting. In general, after styling, a `Styler` object is returned. This displays as the styled `DataFrame` but is not actually a `DataFrame`. So you will want to do your styling last, just for display, not on an intermediate `DataFrame` that you're using for calculations.\n", "\n", "See [the `pandas` guide on styling here.](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html) \n", "\n", "### Number Formatting\n", "\n", "We can set number formatting in a `DataFrame` much in the same way as we would set it in an f-string. Here is the f-string version for review:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'$10.12'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_num = 10.12\n", "f'${my_num:,.2f}'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now here it is applied to the `DataFrame` column `Reservation Price`. You can see that the format code looks the same, only we omit the variable name before the colon.\n", "\n", "The actual call to format looks a bit different. Here we must pass to `df.style.format` a dictionary whose keys are the column names and values are the format specifier strings." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe150$10.120.6
1Jim200$15.170.4
2Mary130$13.250.7
" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = df.style.format({\n", " 'Reservation Price': \"${:,.2f}\"\n", "})\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that this formatted the `Reservation Price` column with the format we specified. We can also see that the type of this object is `Styler`. \n", "\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.io.formats.style.Styler" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now do this for `Percentage Active` as well." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe150$10.1260%
1Jim200$15.1740%
2Mary130$13.2570%
" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = s.format({\n", " 'Percentage Active': '{:.0%}'\n", "})\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can either chain additional format calls on the existing `Styler`, as shown above, or we can do it from the original `DataFrame` passing both formats at once:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe150$10.1260%
1Jim200$15.1740%
2Mary130$13.2570%
" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = df.style.format({\n", " 'Reservation Price': \"${:,.2f}\",\n", " 'Percentage Active': '{:.0%}'\n", "})\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cell Formatting\n", "\n", "We can apply any formatting to each individual cell. To apply the formatting cell by cell, use `.applymap` on the `Styler` object. It works the same as `DataFrame.applymap`. The formatting function passed to `Styler.applymap` must return a string. The contents of this string should be CSS properties for the HTML representation of the `DataFrame`, separated by semicolons. Typically, `color`, `background-color`, and `text-align` are the three needed properties. \n", "\n", "So you can return `color: blue` to turn the text blue:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def set_color_to_blue(value):\n", " return 'color: blue'\n", "\n", "df.style.applymap(set_color_to_blue)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: As it's CSS, you can use absolutely any color you want by [specifying its hex or RGB code.](https://www.w3schools.com/colors/colors_picker.asp) Or you can pick from a list of [predefined color names here.](https://www.w3schools.com/cssref/css_colors.asp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can set the background color to light green:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def set_bg_to_light_green(value):\n", " return 'background-color: lightgreen'\n", "\n", "df.style.applymap(set_bg_to_light_green)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or center the cells:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def center_cell(value):\n", " return 'text-align: center'\n", "\n", "df.style.applymap(center_cell)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can set the background color, and text color, and text alignment all at once:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def set_color_to_white_on_black_centered(value):\n", " return 'color: white; background-color: black; text-align: center'\n", "\n", "df.style.applymap(set_color_to_white_on_black_centered)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conditional Formatting\n", "\n", "We have seen already we are using arbitrary functions to apply the styling to the `DataFrame`s. It is trivial to add some conditional logic to these functions to use conditional formatting. For example, let's highlight in red anyone who is active less than 50% of the time: " ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def highlight_inactive(value):\n", " \n", " # This section needed to handle Name column\n", " if isinstance(value, str): # isinstance checks: is value of type str?\n", " return ''\n", " \n", " # Main logic\n", " if value < 0.5:\n", " return 'background-color: pink'\n", " return ''\n", "\n", "df.style.applymap(highlight_inactive)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the function operates one cell at a time. If we return an empty string, then that cell will not get formatted. If it's ever not clear which styles are getting applied where, you can directly do an `.applymap` on the `DataFrame` rather than the `Styler` to see where they are getting applied:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeightReservation PricePercentage Active
0
1background-color: pink
2
\n", "
" ], "text/plain": [ " Name Weight Reservation Price Percentage Active\n", "0 \n", "1 background-color: pink\n", "2 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.applymap(highlight_inactive)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overall Table Formatting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can hide the index if it's not useful." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
Joe15010.120.6
Jim20015.170.4
Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.hide_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also hide columns you don't need in the presentation:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Reservation Price Percentage Active
0Joe10.120.6
1Jim15.170.4
2Mary13.250.7
" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.hide_columns(['Weight'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can add a title to your `DataFrame`:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
My Table
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.set_caption('My Table')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also change the overall look of the table. This is considered an advanced feature and I will only just mention it and not cover it in detail. Here is the example from the `pandas` documentation:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Hover to highlight.
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def hover(hover_color=\"#ffff99\"):\n", " return dict(selector=\"tr:hover\",\n", " props=[(\"background-color\", hover_color)])\n", "\n", "styles = [\n", " hover(),\n", " dict(selector=\"th\", props=[(\"font-size\", \"150%\"),\n", " (\"text-align\", \"center\")]),\n", " dict(selector=\"caption\", props=[(\"caption-side\", \"bottom\")])\n", "]\n", "df.style.set_table_styles(styles).set_caption(\"Hover to highlight.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inline Bar Charts\n", "\n", "While this would have been possible with conditional formatting alone, `pandas` also provides a convenient method to create a bar graph within `DataFrame` columns:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.bar()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the bars are longer when the value in the column is greater. There are also different alignments for the bar." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.bar(align='zero', color='lightblue')" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.bar(align='mid', color='lightgreen', subset=['Reservation Price'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to Organize Styling Code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that none of this styling that we've applied in various spots has come back to the `DataFrame` itself:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameWeightReservation PricePercentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
\n", "
" ], "text/plain": [ " Name Weight Reservation Price Percentage Active\n", "0 Joe 150 10.12 0.6\n", "1 Jim 200 15.17 0.4\n", "2 Mary 130 13.25 0.7" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have multiple `DataFrame`s you want to format in the same fashion, then you can write a function for it:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "def styled_df(df):\n", " s = ( # parentheses as group to split onto multiple lines\n", " df.style.format({\n", " 'Reservation Price': \"${:,.2f}\",\n", " 'Percentage Active': '{:.0%}'\n", " })\n", " .applymap(highlight_inactive)\n", " .applymap(center_cell)\n", " .hide_index()\n", " .set_caption('Personal Info')\n", " .bar(align='mid', color='lightgreen', subset=['Reservation Price'])\n", " )\n", " return s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can apply this formatting to our original `DataFrame` to view it:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Personal Info
Name Weight Reservation Price Percentage Active
Joe150$10.1260%
Jim200$15.1740%
Mary130$13.2570%
" ], "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "styled_df(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also apply it to other `DataFrame`s now that we have a general function." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Personal Info
Reservation Price Percentage Active
$8.1048%
$12.1432%
$10.6056%
" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "styled_df(df[['Reservation Price', 'Percentage Active']] * 0.8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some Common Formatting Shortcuts\n", "\n", "You can accomplish all cell and number formatting with `.format`, and `.applymap`. But there are some shortcuts for common design patterns:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Highlighting the max value in a column:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.highlight_max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with other commands, you can specify colors or subset of columns." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.highlight_max(color='lightblue', subset=['Weight', 'Reservation Price'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Highlighting the minimum value in a column:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.highlight_min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same style options supported." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.highlight_min(color='lightblue', subset=['Weight', 'Reservation Price'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make a gradient based on the values within a column. Here it's getting darker for a higher value by default:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.background_gradient()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the color of the gradient to green for high, yellow for mid, and red for low." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name Weight Reservation Price Percentage Active
0Joe15010.120.6
1Jim20015.170.4
2Mary13013.250.7
" ], "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.style.background_gradient(cmap='RdYlGn')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 4 }